AITopics | train-test split

Collaborating Authors

train-test split

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Limitations

Neural Information Processing SystemsApr-30-2026, 01:05:44 GMT

While our study identifies clear separations between model hypothesis classes, our best models still have not reached the consistency ceiling of the neural and behavioral benchmarks we have compared against. The latent future prediction dynamics modules of all the foundation models were pretrained on Physion just as the end-to-end models were, and those Physion trained dynamics modules were evaluated against neural and behavioral data, ultimately outperforming the end-to-end Physion dynamics. Despite our interest, pretraining the end-to-end models on datasets larger than Physion exceeds our current computational resources, as evidenced by models like FitVid requiring nearly a month of training on eight A100 GPUs with Physion alone. Therefore, the vision foundation models ultimately have to deal with the harder problem of generalizing to Physion compared to end-to-end models. While we believe our dynamically-equipped foundation model paradigm to be a generally promising way forward towards models with strong internal simulations, we identify in the Discussion ( 7), several ways that their encoder and dynamics modules can be improved, which we plan to explore in future work.

artificial intelligence, machine learning, predictivity, (18 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Supplementary: Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning AAnalyzing the model bias for selecting train-test splits

Neural Information Processing SystemsApr-27-2026, 05:22:26 GMT

Values are normalized for comparability of FID progression, as FID scores are not upper bounded and as such, absolute values for different networks and pretraining methods differ. To analyze the impact of the network architecture, pretraining method and training data, respectively the learned feature representations, on the construction of train-test splits and the entailed difficulties, we repeat our class swapping and removal procedure introduced in Section 3 in the main paper using different self-supervised models. Subsequently, we select train-test splits from the same iteration steps. Figure 1 compares the progression of distribution shifts based on FID scores normalized to the [0,1] interval for valid comparison. We observe that across all pretrained models, the general FID progressions and sampled train-test splits exhibit very similar learning problem difficulties, indicating that our sampling procedure is robust to the choice of readily available, state-of-the art self-supervised pretrained models.

artificial intelligence, machine learning, train-test split, (12 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

d1f255a373a3cef72e03aa9d980c7eca-Paper.pdf

Neural Information Processing SystemsApr-27-2026, 05:22:22 GMT

distribution shift, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

05a7ad45d75a3082d7a3a70de8743140-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-18-2026, 21:54:34 GMT

ec number, reaction, sequence, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.93)
Materials > Chemicals (0.93)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Biomedical Informatics > Translational Bioinformatics (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Limitations

Neural Information Processing SystemsFeb-17-2026, 13:36:40 GMT

While our study identifies clear separations between model hypothesis classes, our best models still have not reached the consistency ceiling of the neural and behavioral benchmarks we have compared against. All models were simultaneously trained across all eight scenarios of the Physion Dynamics Training Set, constituting around 16,000 total training scenarios (2,000 scenes per scenario) [Bear et al., 2021], with a Each C-SWM [Kipf et al., 2020] model was trained on For each stimulus, we compute the proportion of "hit" responses by The Correlation to A verage Human Response is the Pearson's correlation between the model probability-hit vector and the human proportion-hit vector, across stimuli per scenario. OCP Accuracy of humans and models is the average accuracy, across stimuli per scenario. To give the final values of the two quantities, we then compute the weighted mean and s.e.m. of the above per Note that these values are therefore different for each condition, but always the same across all models. All neural predictivities are reported on heldout conditions and their timepoints.

artificial intelligence, machine learning, predictivity, (19 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

fc2e6a440b94f64831840137698021e1-Supplemental.pdf

Neural Information Processing SystemsFeb-12-2026, 00:57:47 GMT

ja 0, jb 0, relaxation, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Supplementary: CharacterizingGeneralizationunder Out-Of-DistributionShiftsinDeepMetricLearning

Neural Information Processing SystemsFeb-11-2026, 07:14:07 GMT

Subsequently, we select train-test splits from the same iteration steps. These settings are used throughout our study. For the few-shot experiments, the same pipeline parameters were utilized with changes noted in the respectivesection. However,thefactthatFIDscores are relatively close to another despite large semantic differences between datasets may indicate that FID based on our utilised FID estimator (Sec. Beyond these limits, generic representations learned byself-supervised learning may offerbetter zero-shot generalization,asalsodiscussedonSec.

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

CharacterizingGeneralizationunder Out-Of-DistributionShiftsinDeepMetricLearning

Neural Information Processing SystemsFeb-11-2026, 07:14:03 GMT

However, common evaluation protocols only test a single, fixed data split in which train and test classes are assigned randomly. More realistic evaluations should consider abroad spectrum of distribution shifts with potentially varying degree and difficulty. In this work, we systematically construct train-test splits of increasing difficulty and present the ooDML benchmark to characterize generalization underout-of-distribution shifts inDML.ooDMLis

artificial intelligence, generalization, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Not All Splits Are Equal: Rethinking Attribute Generalization Across Unrelated Categories

Fircă, Liviu Nicolae, Bărbălau, Antonio, Oneata, Dan, Burceanu, Elena

arXiv.org Artificial IntelligenceNov-27-2025

Can models generalize attribute knowledge across semantically and perceptually dissimilar categories? While prior work has addressed attribute prediction within narrow taxonomic or visually similar domains, it remains unclear whether current models can abstract attributes and apply them to conceptually distant categories. This work presents the first explicit evaluation for the robustness of the attribute prediction task under such conditions, testing whether models can correctly infer shared attributes between unrelated object types: e.g., identifying that the attribute "has four legs" is common to both "dogs" and "chairs". To enable this evaluation, we introduce train-test split strategies that progressively reduce correlation between training and test sets, based on: LLM-driven semantic grouping, embedding similarity thresholding, embedding-based clustering, and supercategory-based partitioning using ground-truth labels. Results show a sharp drop in performance as the correlation between training and test categories decreases, indicating strong sensitivity to split design. Among the evaluated methods, clustering yields the most effective trade-off, reducing hidden correlations while preserving learnability. These findings offer new insights into the limitations of current representations and inform future benchmark construction for attribute reasoning.

correlation, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.06998

Country: Europe > Romania (0.15)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Zero-Shot Performance Prediction for Probabilistic Scaling Laws

Schram, Viktoria, Hiller, Markus, Beck, Daniel, Cohn, Trevor

arXiv.org Artificial IntelligenceOct-21-2025

The prediction of learning curves for Natural Language Processing (NLP) models enables informed decision-making to meet specific performance objectives, while reducing computational overhead and lowering the costs associated with dataset acquisition and curation. In this work, we formulate the prediction task as a multitask learning problem, where each task's data is modelled as being organized within a two-layer hierarchy. To model the shared information and dependencies across tasks and hierarchical levels, we employ latent variable multi-output Gaussian Processes, enabling to account for task correlations and supporting zero-shot prediction of learning curves (LCs). We demonstrate that this approach facilitates the development of probabilistic scaling laws at lower costs. Applying an active learning strategy, LCs can be queried to reduce predictive uncertainty and provide predictions close to ground truth scaling laws. We validate our framework on three small-scale NLP datasets with up to $30$ LCs. These are obtained from nanoGPT models, from bilingual translation using mBART and Transformer models, and from multilingual translation using M2M100 models of varying sizes.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.16743

Country: